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Accomplishments :  During  the  project  year  ending  August  31,  1987  there  were  three 

areas  in  which  work  was  carried  out  and  results  obtained.  These  relate  to 
nonMarkovian  extensions  of  the  Markov  Monte  Carlo  simulation  methods,  graphical 
presentations  of  Monte  Carlo  results,  and  prediction  of  time  to  failure 
distributions  of  brittle  components. 

We  have  completed  the  implementation  of  the  generalization  of  our  earlier 
inhomogeneous  Markov  Monte  Carlo  code  to  treat  parts  replacement  problems  for 
which  the  Markov  property  is  lost.  The  resulting  method  retains  the  use  of  the 
self-transition  sampling1  to  model  time-dependent  failure  rates  and  is  fully 
compatible  with  our  variance  reduction  techniques. 

The  effectiveness  of  the  new  techniques  have  been  demonstrated  by  applying 
them  to  two  classes  of  problems.  Tn  the  first,  comparisons  are  made  between 
batch  replacement  and  time  replacement  policies  on  redundant  configurations  of 
components.  This  work  was  recently  presented  at  an  international  topical  meeting 
on  probabilistic  risk  assessment;  a  reprint  is  included  as  an  appendix. ^  in 
addition,  we  have  used  our  code  to  make  reliability  simulations  of  a  widely  used 
redundant  configuration  for  flight-critical  avionics  systems:  the  trimodular 
redundant  (TMR)  system  with  reconf igurable  spares.  This  work^  will  be  reported 
at  the  Annual  Reliability  and  Maintainability  Symposium  to  be  held  in  January, 
1988. 

The  second  area  was  unanticipated  at  the  time  the  proposal  was  written.  It 
is  the  development  of  an  effective  means  for  the  graphical  presentation  of  Monte 
Carlo  results.  An  argument  in  favor  of  analytical  or  deterministic  numerical 
methods  for  the  analysis  of  Markov  processes  has  been  that  from  them  one  obtains 
results  in  the  form  of  time-dependent  curves,  while  Monte  Carlo  Simulation  yields 
only  a  single  result  at  a  specified  time.  Since  a  great  deal  of  insight  into  the 
nature  of  the  solution  is  lost,  Monte  Carlo  is  often  relegated  to  a  method  of 
last  resort  for  otherwise  intractable  problems.  We  have  circumvented  this 
problem  by  treating  the  Monte  Carlo  simulation  as  a  set  of  grouped  life-test  data 
and  employing  nonparamet ric  methods  to  generate  curves  of  reliability  or 
availability  vs  time.  The  resulting  techniques  increase  the  computation  times 
over  those  required  for  a  result  at  a  specified  time  by  only  a  few  percent.  They 
were  employed  to  generate  the  curves  shown  in  the  appendix  and  will  be  reported 
in  a  short  paper. 

The  third  area  involves  the  development  of  methods  to  generate  the  time- 
dependent  failure  rate  curves  needed  to  estimate  wear  or  aging  effects  in  Monte 
Carlo  or  deterministic  treatments  of  reliability  problems.  We  are  focusing  our 
efforts  on  the  mathematical  representation  of  fatigue  failures  of  brittle 
mechanical  components.  We  have  tentatively  constructed  a  model  in  which  finite 
element  results  can  be  represented  as  probability  density  functons  of  stress 
which  in  turn  can  be  incorporated  into  Monte  Carlo  reliability  simulations. 

Personnel :  In  addition  to  the  principal  investigator,  the  contract  continued  to 
support  a  graduate  student,  Kranz  Boehm,  who  is  seeking  the  PhD  in  mechanical 
engineering.  In  addition,  the  research  was  assisted  by  an  MS  student,  Mr.  Uve 
Hald,  who  received  no  support  from  the  AFOSK  contract. 


•s>  v  v 


Travel :  During  the  year  the  principal  investigator  made  a  one  day  visit  to  the 
Rome  Air  Development  Center  to  confer  with  personnel  in  the  Reliability  and 
Maintainability  Section,  and  he  attended  the  Annual  Reliability  and 
Maintainability  Symposium.  These  visits  were  instrumental  in  bringing  about  the 
work  on  the  TMR  system  described  above.  In  addition,  part  of  the  principal 
investigator's  summer  appointment  at  the  University  of  Stuttgart  was  spent 
conferring  with  Dr.  Lauf  and  others  in  developing  the  third  area  listed  above. 
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GENERALIZATION  OF  MARKOV  MONTE  CARLO  RELIABILITY  ANALYSIS 
TO  INCLUDE  NONMARKOVIAN  MAINTENANCE  STRATEGIES 

F.  Boehm,  U.  P.  Raid,  £.  E.  Lewis  and  Z.  Tu 
(Department  of  Mechanical  Engineering 
Northwestern  University,  Evanston,  IL  60201,  USA] 


ABSTRAC1 


The  Lagrangian  approach  to  Markov  Monte  Carlo  methods  for  systems  relia¬ 
bility  analysis  is  generalized  to  include  nonMarkovian  phenomena  in  which  system 
components  are  replaced.  The  method  is  then  employed  to  analyze  the  unrelia¬ 
bility  and  unavailability  of  a  number  of  redundant  systems  in  which  maintenance 
is  carried  out  by  batch  or  time  replacement  of  aging  components. 


INTRODUCTION 


The  Lagrangian  approach  to  Markov  Monte  Carlo  methods  has  been  shown  to  be 
very  effective  for  estimating  reliability  and  availability  of  complex  systems. 
The  ability  to  treat  general  component  dependencies  in  multicomponent  systems, 
coupled  with  the  use  of  variance  reduction  techniques  to  greatly  Increase  samp¬ 
ling  efficiency,  results  in  highly  efficient  algorithms,  capable  of  treating 
Markov  models  that  would  be  intractable  by  deterministic  computational 
methods.  More  recently,  the  Monte  Carlo  formulation  has  been  generalized 
through  a  nonanalog  sampling  technique  called  self-transitions  to  treat  time- 
inhomogeneous  Markov  processes.  This  has  allowed  the  replacement  of  constant 
failure  rates  with  more  realistic  "bathtub"  curves  thereby  permitting  the 
simulation  of  component  wear  and  periodic  preventive  maintenance. 

In  a  variety  of  problems,  some  critical  to  reactor  safety,  departures  from 
Markov  models  are  required.  For,  if  as-good-as-new  repair  or  parts  replacement 
are  permitted  following  revealed  failures,  the  Markov  property  14  is  lost.  This 
is  illustrated  by  the  failure  rate  curves  in  Fig.  1.  The  solid  line  lcu'-ve  c) 
represents  the  failure  rate  (with  preventive  maintenance)  in  a  time-inhomo¬ 
geneous  Markov  calculation.  Curve  (c)  is  a  reasonable  approximation  to  the  as- 
good-as-old  repair  (curve  b)  since  the  time  between  failure  and  repair  (tr~tf) 
normally  is  small.  However,  for  as-good-as-new  repair  (curve  a)  faithful 
modeling  requires  that  the  failure  rate  curve  be  reinitialized  ac  tf.  Moreover, 
if  age  (as  opposed  to  batch)  replacement  policies  are  to  be  studied,  the  times 
at  which  preventive  replacement  is  carried  out  then  also  depend  on  the  time  of 
the  last  component  failure. 

In  this  paper  earlier  work  in  applying  Monte  Carlo  techniques  to  the 
evaluation  of  Markov  reliability  models  is  generalized  to  systems  in  which 

the  Markov  property  must  be  violated  in  order  to  retain  the  age  of  each 
replaceable  component  in  the  simulation.  For  only  in  this  way  can  classes  of 
reliability  problems  that  conbine  component  wear,  preventive  maintenance,  and 
parts  replacement  be  treated.  Such  analysis  is  required,  for  example,  to 


determine  the  effects  of  alternative  maintenance  policies  on  the  reliability  and 
availability  of  highly  redundant  nuclear  safetv  systems. 


Figure  1:  Failure  race  curves  showinc  three  models  for  repair  of 

revealed  failures:  (a)  as-good-as  new;  (b)  as-good-as  old; 
(c)  continuous  wear. 


THEORY 

The  generalization  of  the  Monte  Carlo  formalism  to  treat  nonMarkovian 
renewal  processes  can  be  most  compactly  summarized  by  retaining  the  framework 
used  in  simulating  reliability  problems  represented  as  continuous-time, 
inhomogeneous  Markov  processes.  Let  p-^Ct)  be  the  probability  that  a  system  is 

in  state  k  at  time  t,  where  each  of  the  2n  states  for  an  n-component  system 

constitutes  a  unique  combination  of  operating  and  failed  components.  The 

equations  to  be  simulated  are  then 


d7  pk(t)  =  'Yk(t)pk(t)  +  ,-,qU  k,*t)Yk'(t)Pk*(t)  - 


with  initial  conditions  l\(0)  =  d.  The  transition  race  y,  (t)  out  of  state  k 
is  given  by  * 

Vc)  -  I  +  -  (2) 

IcO.  lc  F,  1 

k  k 

where  1  ^(t)  and  y  ,  are  the  failure  i  :d  repair  rates  of  component  l,  while  in 
state  k  and  0^  and  are  the  sets  :  -rotational  3nd  failed  components, 
respectively,  in  state  k.  The  qua-.t:  ••  .  <  , is  the  conditional  probability 
that  given  a  transition  out  of  stite  ■'  it  time  t,  the  new  state  will  be  k;  it 
may  be  written  as 
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q ( k | k  ’  ,  C )  -  Yfcfc* (t)/yk. (t ), 


(3) 


where  Ykk,(t)  is  c^e  component  failure  or  repair  rate  that  corresponds  to 
the  k’  -*■  k  transition. 

In  Markov  Monte  Carlo  each  of  the  N  trials  consists  of  following  the  state 
transitions  through  some  finite  period  of  time,  say  tne  design  life  of  the 
system,  T.  From  time  t'  and  state  k',  the  time  of  the  next  transition  is 
sampled  from  the  cumulative  probability  distribution 

t 

F ( 1 1 1 '  ,  k ' )  =  1  -  exp{-  /  y,  , ( t" )dt" J .  (4) 

t' 

In  analog  Monte  Carlo  simulation  of  homogeneous  Markov  processes,  in  which  the 
failure  rates  are  constant,  the  time  can  be  sampled  using  a  uniformly  distri¬ 
buted  random  number  in  the  direcc  inverse  method.  The  sampling  is  modified  by 
the  use  of  self  transitions  if  it  is  necessary  to  treat  the  time-dependent 
failure  rates  that  appear  with  wear  or  early  failures-*.  For  computational 
efficiency  nonanalog  variance  reduction  normally  is  employed.  In  this,  the 
sampling  distribution  of  Eq.  (4)  is  modified  to  force  more  transitions,  and  each 
trial  then  caries  a  weight  which  is  appropriately  altered  to  maintain  unbiased 
estimates  of  the  system  unreliability  or  unavailability.  After  each  transition 
a  second  random  number  is  generated  to  sample  Eq.  (3)  and  determine  the  new 
system,  state  k.  Once  again,  nonanalog  variance  reduction  is  employed  to 
enhance  the  number  of  failures  and  suppress  the  number  of  repairs.  The 
resulting  biasing  of  the  state  transition  matrix  is  compensated  once  again  by 
a  change  in  the  trial  weight  to  maintain  an  unbiased  reliability  or  availability 
estimate* . 

To  determine  when  system  failure  has  resulted  from  state  transition,  a 
fault  tree  describing  the  component  configuration  is  evaluated  qualitatively 
either  by  bottom  up  evaluation  or  by  using  cut  sets.  The  tally  for  the 
unreliability  is 
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where  wn  is  the  weight  of  the  nth  trial  at  the  time  of  system  failure,  if  system 
failure  for  that  trial  occurs  at  tn<T.  The  corresponding  mission  availability 
estimator  is 
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where  wn  is  the  history  weight  at  the  time  of  the  first  system  failure.  Since 
in  our  algorithms  the  calculation  reverts  to  analog  Monte  Carlo  after  the  first 
system  failure  in  a  trial,  the  weight  at  the  time,  tn,  of  the  first  failure  is 
multiplied  by  A  the  total  down  time  for  the  duration  of  the  trial.  For  both 
unreliability  and  unavailability  the  sampe  variance  is  tallied  along  with  Eqs. 
(5)  or  (6),  and  the  central  limit  theorem  is  used  to  estimate  68%  confidence 
Intervals  for  the  results. 
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Component 
by  replacing  t 


replacement  of  renewal  is  incorporated  into  the  above  formalism 
with  a  vector  t_  in  the  transition  rates  and  probabilities 
appearing  in  the  Markov  equations.  The  ith  component  of  the  vector  _r  is  just 
the  time  since  the  1th  component  in  the  system  was  replaced,  or  underwent  as- 
good-as-new  maintenance.  Equation  (2)  for  the  kth  state  transition  rate  is  thus 
replaced  by 
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(7) 


where  t 
is  now 


is  the  age  of 
j;iven  by 


the  ith  component.  Likewise,  the  transition  probability 


q(k|k’,T)  = 


=  Ykk(t)/V 


(t) 


The  ability  to  track  Monte  Carlo  trials  in  which  the  age  of  each  component 
must  be  incorporated  into  the  transition  probabilities  has  been  incorporated 
into  our  Monte  Carlo  simulations.  Moreover,  this  generalization  placed  no 
limitations  on  the  use  of  presently-used  importance  sampling  techniques  or  on 
the  use  of  the  self-transition  technique  for  treating  time  dependent  failure 
rates . 


RESULTS 


To  demonstrate  the  use  of  the  component  renewal  feature  of  Monte  Carlo 
reliability  analysis  simulations  have  been  made  for  a  number  of  redundant 
configurations  in  which  component  wear  is  present,  and/or  in  which  either  time 
or  batch  replacement  is  used  as  maintenance  policies.  Recall  that  in  time 
replacement  a  component  is  replaced  at  failure  or  after  it  has  been  in  operation 
for  a  predetermined  length  of  time,  whichever  occurs  first;  in  batch  replacement 
the  component  is  replaced  at  failure  or  during  predetermined  maintenance  times 
that  do  not  depend  on  how  long  the  component  has  been  in  operation. 

In  Tables  I  and  II  the  unreliability  and  unavailability  results  for  four 
different  systems,  namely  a  single  component  and  (1/2),  (1/3)  and  (2/3)  active 
parallel  systems.  The  components  are  taken  to  be  identical;  their  failure  rates 
are  represented  by  Weibull  distributions,  i.e. 


Ht) 


m  ,  .  ,m-l 

+  -  (t/9) 


with  the  parameters  \  =»  0.013/yr,  m  =  2.5  and  9  =  7.5  yr.  The  repair  rate  is 
given  as  u  >  10/yr.  ¥he  design  life  of  the  systems  is  5  yr.  The  time  of  the 
first  maintenance  for  the  single  component  is  t  =  1  yr.  For  the  multicomponent 
systems,  maintenance  is  performed  on  a  staggered  basis,  i.e.,  for  the  two 
component  system  the  time  of  tne  first  maintenance  of  component  1  is  t  =  1  yr 
and  t  =  2  yr  for  component  2.  For  the  three  component  system  the  time  of  the 
first  maintenance  is  t  *  0.567  yr  for  component  1,  t  =  1.333  yr  for  component  2 

The  maintenance  intervals 


and  t  *  2  yr  for  component  3. 
replacement  and  the  age  of  the  c 
to  be  it  =  2  vr  for  all  calcola: 
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in  case  of  batch 
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TABLE  I:  Unreliabilities  for  example  problems. 


System  Model1  Without  maintenance 


With  maintenance 


1/1 

1 

(0.3428*0.00273*10° 

(0.1350*0.00083*10° 

2 

(0.3428*0.00273*10° 

(0.1350*0.00083*10° 

3 

(0.3428*0.00273*10° 

(0.1350*0.00083*10° 

1/2 

1 

(0.9259*0. 017  5)*10-2 

(0.7  58  2*0 .0066) *10-3 

2 

(0. 6886*0. 0131)*10"2 

(0. 7457*0. 0066)*10“3 

3 

(0. 6886*0. 01 31 )*10-2 

(0. 7573*0. 0067)*10"3 

1/3 

1 

(0. 1816*0. 0086)*10“3 

(0. 2990*0. 0039)*10"5 

2 

(0. 1114*0. 0044)*10-3 

(0. 2899*0. 0039)*10“5 

3 

(0. 1114*0. 0044)*10“3 

(0. 3003*0. OO41)*10-5 

2/3 

1 

(0. 2747*0. 0085)*10_1 

(0. 2268*0. 0016)*10-2 

2 

(0. 2080*0. 0068)*10_1 

(0. 2227*0. 0016)*10“2 

3 

(0. 2080*0. 0068)*10"1 

(0. 2257*0. 0016)*10"2 

Table  II: 

:  Interval 

unavailabilities  for  ex 

ample  problems. 

System 

Model1 

Without  Maintenance 

With  Maintenance 

1/1 

1 

(0. 1200*0. 00083*10"° 

(0. 2756*0. 0045)*10"2 

2 

(0. 7229*0. 0094)*10-2 

(0. 2714*0. 0038)*10"2 

3 

(0. 7229*0. 0094)*10"2 

(u. 2780*0. 0039)*10"2 

1/2 

1 

(0. 9397*0. 0174)*10"4 

( 0.7721*0. 0169)*1 o"3 

2 

(0. 7022*0. 015  2)  *10_,:* 

(0. 7520*0. 016o)*10"5 

3 

(0. 7022*0. 01523*10"* 

(0.7665*0. 01703*10" 3 

1/3 

1 

(0. 1255*0. 0041)*10~3 

(0. 2053*0. 007 J)*10-7 

2 

(0. 7354*0. 0299  3  * 10-6 

(0. 1981*0. 0072)*10-7 

3 

(0.7354*0.02993*10-° 

(0.20 30*0. 0074)*10'7 

2/3 

1 

(0.27  28*0 . 003-4  3*1  >-3 

(  j. 2292*0. 0045)*10_* 

2 

(0. 2023*0. 004-)*l  i"J 

(0. 2241*0. 0045)*10"4 

3 

(0 . 20  2  3*0.004 .1*1  "3 

f 3. 2300*0. 0046)*10"H 

*  Model 

l  ...  continuous  aging  with  bitch 

re r  . cement  , 

Mode  l 

2  ...  as 

good  as  new  repair  with 

hatch  replacement,  with  n 

Mode  I 

3  ...  as 

good  as  new  repair  with 

time  replacement. 

— if%nrir»v»vwTw'j^'jv.  »v  rj 


The  data  in  Tables  I  and  II  are  indicative  of  the  increases  in  reliability  and 
availability  through  component  maintenance  and  replacement.  Equally  valuable  is 
the  ability  to  examine  the  time  dependence  of  the  reliability  and/or 
availability.  To  this  end,  algorithms  have  been  developed  which  will  allow  the 
generation  of  the  time  dependent  quantities  along  with  the  corresponding 
confidence  interval.  This  is  done  while  adding  less  than  10%  to  the  computing 
cime  of  a  Monte  Carlo  Simulation^3. 

In  Figures  2  and  3  are  shown  the  unreliability  and  interval-  unavailability 
vs  time  for  the  1/2  active  parallel  systems.  Each  run  is  for  1000  trials.  Three 
sets  of  curves  are  shown,  each  with  three  lines  corresponding  to  the  estimator 
and  the  b8%  conf.de. ic-  nterval. 
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Figure  2:  I'nrel  iahil  i  t  vers  :s  time. 
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The  highest  unreliability  and  unavailability  are  for  model  (1)  with  no 
maintenance.  The  results  for  model  2  and  model  3  with  no  maintenance  are  nearly 
!  indistinguishable  and  are  hence  shown  as  one  set  of  curves.  Finally  the  smallest 

}  unreliabilities  and  maintainabilities  shown  in  Figs.  2  and  3  are  for  the 

i  maintained  systems.  Here  the  differences  between  the  three  models  are  nearly 

indistinguishable.  As  could  be  expected,  the  curves  for  the  unmai ntai ned  systems 
are  concave  up  and  demonstrating  the  marked  effects  of  component  aging.  Where 
i  maintenance  is  present  the  unreliability  uses  in  a  more  linear  manner  while  the 
'  unavailability  levels  off  toward  an  asymptotic  value. 

I  Similar  results  are  obtained  for  standby  conf igurations  and  for  raulti- 

[  component  systems  with  more  complex  redundant  configurations.  Brevity  requires 

that  they  not  be  presented  here. 
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